Groovy正则表达式

2023-08-26 06:11| 来源: 网络整理| 查看: 265

Groovy正则表达式语法

Groovy中的正则表达式本质上还是会使用到JDK中的java.lang.regex 包中的class

一些简单的正则表达式：

def reg1 = ~'he*llo' def reg2 = /he*llo/ println "reg1 type is ${reg1.class}" println "reg2 type is ${reg2.class}" println "hello".matches(reg1) println "hello".matches(reg2)

运行结果：

reg1 type is class java.util.regex.Pattern reg2 type is class java.lang.String true true

上式中使用了~ + 字符串(以及双/分隔符模式)的方式定义了一个正则表达式

注：在groovy中=~是查询操作符，使用在字符串之后, 要求接一个正则表达式, 返回的是一个java.util.regex.Matcher 对象 def val1 = "hello" =~ "he*llo" println val1.class print val1.matches() //class java.util.regex.Matcher //true 注：在groovy中==~是匹配操作符，使用在字符串之后, 要求接一个正则表达式, 返回的类型为Boolean 类型. 这个操作符要求前面给定的字符串与后面的正则表达式完全匹配才可返回true def val1 = "hello" ==~ "he*llo" println val1.class print val1 //class java.lang.Boolean //true 原字符问题

正则表达式中存在一些特殊的字符(比如\w 表示的是[a-zA-Z0-9])用于文本的匹配，这些字符一般是以\ 开头，所以这个地方涉及到了转义字符问题。举个例子：

def val1 = "test value" println 'value is ${val1}' println "value is ${val1}" //value is ${val1} //value is test value 如果在构建正则表达式字符串的时候，使用""表示字符串，就需要使用\\ 来表示\，比如： def reg1 = "hello \\w*" def reg2 = /hello \w*/ println "hello world" ==~ reg1 println "hello world" ==~ reg2 //true 单引号中的字符串是以原字符的形式存在的，即是字符串本身就是它显示的意思，尝试使用单引号原字符来进行正则匹配： def reg1 = 'hello \w*' // 更改为 'hello \\w*' 则运行正确 println "hello world" ==~ reg1 //error,使用单引号依然需要进行转义 Pattern 和 Matcher Pattern.matches 和 Pattern.matcher

Matcher matcher(charsequence input) 这个函数返回一个Matcher匹配器对象, 这个匹配器匹配给定的输入与模式

def reg = ~/^hello \w*world$/ def str = "hello world" def matcher = reg.matcher(str) println matcher.class //java.util.regex.Matcher 可将上述Matcher对象在groovy中可以用=~ 操作符号一步完成： def matcher = "hello world"=~/^hello \w*world$/ println matcher.class static boolean matches(string regex, charsequence input) 这个函数编译给定的正则表达式并且尝试匹配给定的输入(直接匹配) println "hello world"==~/^hello \w*world$/ //true Matcher 中的capturing group概念

首先Matcher 的概念是解释Pattern。 Java中需要调用Pattern中的matcher方法返回这个对象，而groovy中只需要使用=~ 操作符号即可创建这样的对象。这个对象即是存储一个正则表达式模式与一个给定输入字符串的所有匹配相关的信息。capturing group 这个概念是针对正则表达式中的() 引入的，正则表达式中的括号表示group，捕获组是从左往右计算其开始括号进行编号的(因为具有括号嵌套的情况, 括号层次越高那么它的组编号自然越小，其中0表示整个表达式：

(A (BC)) group 0: (A(BC)) group 1: (A(BC)) group 2: (BC)

计算表达式的group就从左括号开始算，遇到一个左括号group number就加1。使用group可以用于捕获输入字符串与模式匹配上的部分对应group位置的子字符串：

def str = "hello world hello" def reg = /((el)(l))o/ def matcher = str=~reg def num = 0 while(matcher.find()){ println "the ${num} match sub sequenc" num++ groupnum = matcher.groupCount() println "group count ${matcher.groupCount()}" println "group string ${matcher.group()}" println "group 0 string ${matcher.group(0)}" for(id in 1..groupnum){ println "group ${id} string ${matcher.group(id)}" println "start index is ${matcher.start(id)} and end index is ${matcher.end(id)}" } }

运行结果：

the 0 match sub sequenc group count 3 group string ello group 0 string ello group 1 string ell start index is 1 and end index is 4 group 2 string el start index is 1 and end index is 3 group 3 string l start index is 3 and end index is 4 the 1 match sub sequenc group count 3 group string ello group 0 string ello group 1 string ell start index is 13 and end index is 16 group 2 string el start index is 13 and end index is 15 group 3 string l start index is 15 and end index is 16

注：

groupCount 组数量是不会将group 0计算在内的，组的数量是和括号数量保持一致matcher.group() 方法和matcher.group(0) 方法返回内容都一样，都是和模式匹配的完成的子序列，当传递参数时返回的就是相应编号的捕获组获取的子序列可以使用start end等方法获取到匹配的字符串(或者捕获组匹配到的字符串)的偏移量(end的偏移量位置始终是最后一个字符的位置加1)

在groovy中由于GDK实现了getAt 方法那么其实可以通过索引的方式访问捕获组中的内容：

def reg = ~/h(el)(lo)/ def str = 'hello world hello nihao' def matcher = str=~reg println "first matched substring" println matcher[0] println matcher[0][0] println matcher[0][1] println matcher[0][2] println "second matched substring" println matcher[0] println matcher[0][0] println matcher[0][1] println matcher[0][2]

运行结果：

first matched substring [hello, el, lo] hello el lo second matched substring [hello, el, lo] hello el lo

通过索引访问捕获字符串的规律，返回的matcher对象第一维度索引表示子字符串的索引，返回值为一个ArrayList 包含的内容是全部的子字符串以及与捕获组编号对应的子字符串，这些等价的操作即是matcher.group(index)

Matcher重置

匹配器的重置涉及到两个方法find() 和reset()

find 方法可以指定从哪一个位置重新开始寻找模式匹配的字符串 def reg = /el/ def str = "hello world hello" def matcher = str=~reg while(matcher.find()){ println matcher.group() } matcher.find(0) // 重置matcher 从头开始寻找匹配字符串 // 但此时第一个匹配的子字符串已经获取到了，下一次调用find则是查询下一个匹配字符串 println "reset the matcher" while(matcher.find()){ println matcher.group() } //el //el //reset the matcher //el reset def reg = /el/ def str = "hello world hello" def matcher = str=~reg while(matcher.find()){ println matcher.group() } matcher.reset() println "reset the matcher" while(matcher.find()){ println matcher.group() } //el //el //reset the matcher //el //el 小结

主要是Groovy中特有的操作符号来构建正则表达式使用过程中依赖的各种对象。最本质的还是回归到Java中正则表达式相关的两个核心类(Pattern和Matcher)。Groovy有时候表现得更像是一种“语法糖”，以脚本的方式来完成Java的编程。

【本文地址】

公司简介

联系我们